Description

This RMD covers analysis of variance using discrete wavelet transforms. I’ll pull mostly from Gencay et al (2001) and the waveslim package. The ultimate goal here is to categorize our inflation data according to fine vs. coarse variance–that is, items whose ‘energy’ occurs mainly at high frequencies (say, a few months) versus lower frequencies (say, a couple years)– and price-quantity correlation. Hopefully, this will give us something like the categories that Gardiner Means produced as administered prices versus market prices.

In the sections that follow, I will…

I’ll finish with some preliminary analysis of the relationship between administered prices vs. market prices and the pandemic inflation.

Data

The following code (not shown in markdown) will import and format the BEA data from csv tables.

Query the Data

The following function (not shown in markdown) allows querying the main tables (exp, qua, and pri) based on item levels, where goods and services are level 1; durable goods, nondurable goods, and HH cons exp on services are level 2; and so on. Often, we’ll want the lowest (i.e. most granular level), which can be retrieved with the lowestLevel = T parameter. This is the same as in Process PCE Data.Rmd

Plot the Data

This is an example of a plot from the data–specifically, quantities.

This will plot price data on a single plot. It’s set up to run all items, but commented out.

I looked through each of these and a few things stood out:

  • New domestic and foreign auto prices are almost identical (so much so that I initially thought the data were messed up). Maybe there’s a market governance story to be told about the relationship between the auto manufacturers and the used car dealerships
  • The meat prices rose substantially early-pandemic and, except for poultry, have begun to come down.
  • Conversely, clothing and shoes look to have dropped early pandemic and come back up since
  • Gasoline seems to be tabulated in a funny way. Its showing around 70% of high levels around 2012-14, but gas prices are at about $3.40 a gallon recently, compared to about $3.70 a gallon back then according to https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=PET&s=EMM_EPM0_PTE_NUS_DPG&f=M
  • The house price data seems funny to me, showing a slowing of increase in the pandemic, not a sharp uptick as in the Case-Shiller: https://fred.stlouisfed.org/series/SPCS20RSA
  • A lot of items (e.g. the three hospitals items) look to be identical. I’m guessing the BEA is just copying the more aggregate index numbers for prices to these lower level items, but that the quantity numbers will be different. This is something to check as it will throw off the price-quantity analysis.. I looked at the data in Excel and checked to see if there were any price or quantity series that were exactly equal to the one right before or after it and didn’t find any. However, there are a few that look like they’re identical but a small but consistent amount off from each other.
  • There looks to have been a decent rise in the price of social assistance (esp. home for the elderly and residential mental health and substance abuse services). The increase isn’t huge, but it’s something I haven’t heard about in the news.

This will plot price and quantity data on a single plot. It’s set up to run all items, but commented out. I’m going to use it to check out the issue of duplicate price series mentioned above.

Here’s the list of items sharing the same or very similar price data as the item(s) around them.

## [1] "1 New domestic autos"
## [1] "2 New foreign autos"
## [1] "32 Bicycles and accessories"
## [1] "33 Pleasure boats"
## [1] "34 Pleasure aircraft"
## [1] "35 Other recreational vehicles"
## [1] "95 Government employees' expenditures abroad"
## [1] "96 Private employees' expenditures abroad"
## [1] "98 Tenant-occupied mobile homes"
## [1] "99 Tenant-occupied stationary homes and landlord durables"
## [1] "100 Owner-occupied mobile homes"
## [1] "101 Owner-occupied stationary homes"
## [1] "103 Group housing (23)"
## [1] "112 Specialty outpatient care facilities and health and allied services"
## [1] "113 All other professional medical services"
## [1] "114 Nonprofit hospitals' services to households"
## [1] "115 Proprietary hospitals"
## [1] "116 Government hospitals"
## [1] "117 Nonprofit nursing homes' services to households"
## [1] "118 Proprietary and government nursing homes"
## [1] "120 Auto leasing"
## [1] "121 Truck leasing"
## [1] "126 Taxicabs and ride sharing services"
## [1] "127 Intracity mass transit"
## [1] "142 Casino gambling"
## [1] "143 Lotteries"
## [1] "144 Pari-mutuel net receipts"
## [1] "148 Elementary and secondary school lunches"
## [1] "149 Higher education school lunches"
## [1] "151 Meals at other eating places"
## [1] "152 Meals at drinking places"
## [1] "154 Food supplied to civilians"
## [1] "155 Food supplied to military"
## [1] "170 Household insurance premiums and premium supplements"
## [1] "171 Less: Household insurance normal losses"
## [1] "182 Proprietary and public higher education"
## [1] "183 Nonprofit private higher education services to households"
## [1] "197 Clothing repair, rental, and alterations"
## [1] "198 Repair and hire of footwear"
## [1] "211 Repair of furniture, furnishings, and floor coverings"
## [1] "212 Repair of household appliances"
## [1] "215 U.S. travel outside the United States"
## [1] "216 U.S. student expenditures"

Note, the housing price numbers aren’t identical, but they’re very similar, so I included them here. Same with household insurance. Same with US travel outside US and US student expenditures. Same with new foreign and domestic autos.

Energy Analysis

Energy is defined as the sum of squared values of a vector. Energy is proportional to variance, and the discrete wavelet transform is energy (variance) preserving. Hence the sum of squared values of a time series (x) equals the sum of the sum of squared wavelet detail coefficients (d) across all scales (1 though J), including the smooth (s). Gencay et al. (2001, 125) write this as

\(||x||^{2} = \sum_{j = 1}^{J}||d_j||^2 + ||s_J||^2\)

Where \(||.||^2\) is just the sum of the squared values in the vector. Hence the sum of the squared values of the original series equals the sum of the squared coefficients of all of the wavelet scales, including the smooth.

Using data from IBM’s stock returns in the 1960s, Gencay et al. (2001, 127-8) plot the wavelet energies “normalized by \(N^{-1}\),” which is to say they take the sum of squared coefficients then divide by length (i.e. number of coefficients) for each scale. Dividing by number of coefficients is necessary with a DWT because, by definition, each higher (coarser) scale will have half as many observations as the (finer) scale below it, such that finer scales will typically have higher sums of squared coefficients simply for having much larger numbers of coefficients.

In terms of choosing the wavelet, the authors also note that as “the length of the wavelet filter increases, the approximation to an ideal band-pass filter improves and therefore the wavelet filter will better capture the variability in the frequency intervals associated with the DWT wavelet coefficients.” Hence, below I will use the LA(8) wavelet, not the Haar (which has length 2).

So the idea here is to see what frequencies have the most energy–which is similar to asking whether the time series has a lot of long-period versus short-period variance. To demonstrate, the following will decompose the energy of the DWT for a single item. Note, our data has around 745 months. We may move on to MODWTs later, which don’t care about this, but standard DWTs need dyadic lengths (\(2^2 = 4, 2^3 = 8, 16, 32, 64, 125, 256,\) 256, &c.). However, partial DWTs (see Gencay et al. 2001, 124) should work just as well, and since we really don’t care about frequencies beyond a business cycle (which we’ll call 128 months at most), then we really just need a sample size divisible by 128. This means 640 should work for us, so the following starts at the most recent month available (currently, Nov. 2021) and grabs everything 641 months back (to 1968), making 640 observations after taking the difference.

*Note, we may want to consider cutting this down to to start at Jan. of 1983 for two reasons: first, Gyun Gu has shown that there was a significant change in corproate pricing, a trend toward much greater stability, at around 1983; and second, data for net transactions and used truck margins for used light trucks start in Jan of 1983. Because the used auto market has been such a big part of the pandemic inflation (and presumably used light trucks are, too, though I haven’t confirmed that), it may be advisable to include these details. Note, also, that other important items don’t start ’til later as well, including video and audio streaming and rental (1982) and and software (1977).

To start at 1983 would be 467 months up to Nov. 2021. But that’s not ideal because the closest we could get with a partial DWT would be 384 months, running from late 1989 to the present. We could go down to a J = 6 (64 months) partial DWT, which would allow for 448 months; or we could strictly use MODWTs. But, having finished this Rmd at this point, I’d recommend using the shorter period (384 months) for the energy analysis below, then the longer period (1983 and on) for the price-quantity correlation, since it relies on MODWT.

The following code will produce the breakdown of energies by scale for the price data of eggs as an example.

You can see that eggs have a fairly high degree of energy (the average scale energy, excluding the smooth, across all scales and items came out to about 0.00016, partly raised by extremely high energies for securities commissions). But you can also see that most of that energy is in scales 4 and 5, which correspond to 16-32 and 32-64 month periods–i.e. we’re looking mostly at price changes between about 1-5 years.

Scale Energies for All Items

The following (not shown in markdown) will produce a table of all (least aggregated) items for this time period, including whether they’re durable goods, nondurable goods, or services, and their energies by scale (not including the smooth). The table is ordered by d1 energy (that is, the energy of the finest scale, which represents price changes over 2-4 months).

And here are some histograms (one for each scale) to show the distribution of items in terms of energy. (Note, the 3 items with very high energies, televisions, other video equipment, and securities commissions, are not included).

The following simply gives the items’ ranks in terms of energies at each scale (where rank 1 indicates lowest energy among all items at that scale). The table is ordered by the fourth scale, but it is not printed in the markdown.

Ratio of Detail Energies

This might not be much help, but for the purposes of finding a clear cut-off point between high-energy items (i.e. market items, presumably looking at short-periods–that is, the first scale or two) versus low-energy items (i.e. administered prices), here are histograms that sum the energies for the first two details and for the 3rd-5th details.

It seems like items with combined energies at the first two scales of less than 0.0004 might appropriately be called administered prices, with those greater being called market prices. But, we’ll want to look into the matter more before calling that the distinguishing characteristic.

Out of curiosity, I’m going to build a table of items and their 1st to 4th scale energies. My thinking here is that there might be groups of items with higher 2-4 months energy (the first scale) but low 16-32 month energies (the market prices), items with the opposite (the admin prices), and items with either high or low energies at both scales (the indeterminate). So I’m looking for a bimodal distribution here.

They seem to be more or less normally distributed around 1, that is, approximately the same energy at scale 1 as scale 4 (this is roughly true for scale 1 to scales 2, 3, and 5 as well). A ratio of 1.5 or 2 might make for a good cutoff (ratios above being market prices, below being admin prices), but it seems pretty arbitrary to me.

The ratio of the first detail energy to the fifth (32-64 months) is similar, so let’s just pick a ratio here of 2 as the cutoff.

This seems like it might be a workable approach as the 5th scale (representing about 4 years) (1) is long enough to clearly represent the sort of long-term price changes we’d expect to see from administered prices, (2) is substantially longer than one year, such that we aren’t picking up seasonal pricing, but (3) isn’t so long as to potentially pick up business cycle fluctuations.

After looking at the items that this approach would categorize as market, I don’t think this is a good approach. I’m guessing the issue is that some items have very low scale 5 energies.

Price-Quantity Correlations

As Gencay et al. (2001, 241) explain, for a series \(\textbf{x} = (x_0, x_1, ..., x_{N-1})\) of length N, a MODWT of order J produces wavelet coefficients \(\tilde{w}\). An unbiased estimator of the wavelet variance is given by:

\[\tilde{\sigma}^{2}_{x}(\lambda_j) = \frac{1}{\tilde{N_j}} \sum_{t=L_j-1}^{N-1} \tilde{w}^{2}_{j,t}\]

Where \(L_j = (2^j - 1)(L - 1) + 1\) is the length of scale \(\lambda_j\) wavelet filter and \(\tilde{N}_j = N - L_j + 1\) is the number of coefficients unaffected by the boundary. Confidence intervals can be estimated for the above in a variety of ways (see Gencay et al. 2001, 242-4).

Wavelet covariance and correlation–i.e. for each scale of the wavelet transform–can similarly be estimated for a bivariate time series–i.e., for our study, price and quantity for a given item. Likewise, lags and leads can be introduced to obtain wavelet cross-covariance and cross-correlation; however, because DWTs are not translation invariant, these require use of the MODWT (see Gencay et al. 2001, 252-3). The unbiased estimator for the wavelet covariance of a bivariate series \(X = ((x_{1,0}, x_{2,0}, (x_{1,1}, x_{2,1}),..., (x_{1,N-1}, x_{2,N-1})))\) and MODWT coefficients of the two series, \(\tilde{w}_1\) and \(\tilde{w}_2\), is given by:

\[\tilde{\gamma}_{X}(\lambda_j) = \frac{1}{\tilde{N_j}} \sum_{l=L_j-1}^{N-1} \tilde{w}_{1,j,l} \tilde{w}_{2,j,l}\]

(Gencay et al. 2001, 253). Confidence intervals can be estimated for wavelet covariance as in Gencay et al. (2001, 254-5).

Finally, wavelet correlation simply normalizes the wavelet covariance by the variance of the wavelet coefficients of the two series:

\[\rho_X(\lambda_j) = \frac{\gamma_X(\lambda_j)}{\sigma_1(\lambda_j)\sigma_2(\lambda_j)}\]

which, as usual, will take a value between 0 and 1. The biased estimator for this can be computed with the previous equations (Gencay et al. 2001, 258-9), and confidence intervals can be calculated as in Gencay et al. (2001, 259-60).

To demonstrate, below are, for three items, the original series (difference of logs) and MODWT coefficients (first plot: price; second: quantity) and cross correlation plots with red lines indicating 95% confidence intervals. The code for the correlation component of the following is taken directly from the waveslim documentation (for the function spin.covariance) and recreates Figure 7.9 in Gencay et al. (2001, 261) but using our price and quantity data.

First, for gasoline.

Note that price and quantity for gasoline are fairly correlated at coarser scales with some significant correlations at various lags and leads. For instance, there is a negative contemporaneous (zero lag) correlation at scale four, which can be interpreted as: the price of oil and the quantity sold tend to move in opposite directions over a period of 16-32 months. A similar pattern in scale 3 suggests that this might be related to seasonal patterns.

The correlations for gas are not as high as one might expect, though. A high price-quantity correlation that would more clearly indicate a market price is evident for fresh fruit:

In contrast, new domestic autos show some slight correlation at finer scales (perhaps due to semiannual sales? or negotiations at the dealership?), but otherwise no correlation.

Based on the above approach, the following code (not displayed in markdown) will build a table of price-quantity correlations for all items.

I’ll set the start date to Jan 1983.

Pandemic Inflation

Lastly, and just for good measure, I’ll calculate a couple basic metrics for the extent of inflation per item during the pandemic. The following gives the percentage change in price for each item between January 2020 and the most recent month (currently Nov. 2021). Also, I’ll calculate the % change in the average monthly price change from the 10 years prior to the pandemic and the average price change in the pandemic. Histograms are plotted, because who doesn’t like a histogram, though I’m not presently comparing it to other periods. Also shown are the highest 10 items.

##                                                      Item Pan.Price.Pct.Chg
## 207                                     Domestic services          11.62545
## 209 Repair of furniture, furnishings, and floor coverings          11.78749
## 210                        Repair of household appliances          11.78844
## 82                      Flowers, seeds, and potted plants          11.87176
## 208                 Moving, storage, and freight services          12.01004
## 54                                                   Eggs          12.13912
## 51                                       Fish and seafood          12.16087
## 1                                      New domestic autos          12.24893
## 2                                       New foreign autos          12.24954
## 33                                         Pleasure boats          12.32813
## 32                               Bicycles and accessories          12.32824
## 34                                      Pleasure aircraft          12.32837
## 35                            Other recreational vehicles          12.32857
## 97     Less: Personal remittances in kind to nonresidents          12.44174
## 11                                              Furniture          12.68718
## 160                                         Pension funds          12.69235
## 3                                        New light trucks          12.76665
## 150                Meals at limited service eating places          13.02493
## 92                                          Tobacco (127)          13.05817
## 95              Government employees' expenditures abroad          13.40133
## 96                 Private employees' expenditures abroad          13.40145
## 50                                                Poultry          14.88862
## 165   Portfolio management and investment advice services          17.62670
## 163                                  Indirect commissions          19.17813
## 48                                                   Pork          20.50268
## 15                             Major household appliances          21.90615
## 66                Food produced and consumed on farms (6)          24.17428
## 75                                               Fuel oil          24.20154
## 47                                          Beef and veal          27.04562
## 107                                      Natural gas (28)          32.55022
## 73                          Gasoline and other motor fuel          33.25851
## 76                                            Other fuels          33.90403
## 7                         Net transactions in used trucks          47.58872
## 4                          Net transactions in used autos          47.58888
## 6                                  Employee reimbursement          50.79947
## 122                                  Motor vehicle rental          50.80002
## 8                                       Used truck margin          79.83953
## 5                                        Used auto margin          79.84207

The items that have been making headlines are clear in the first metric, most notably the used car market. And within that the used auto margin (the difference between what dealers pay for used cars and what they charge). Used light trucks is the same. Meats are also showing up, especially beef; but so are major household appliances.

One point of interest is the negative side: government supplied food (school lunches, &c.) are registered with almost the exact same roughly 50% decline in prices. This is probably an anomaly of the survey method, but it’s worth asking: how much is this depressing the headline inflation number?

As for the second metric, You can see in the last plot (with the 45 degree line for parity) that points are clustered around the diagonal line, indicating that most items saw similar average monthly changes during the pandemic as they did in the decade prior, but also that the changes were usually modestly higher during than pre. But there are also outliers. I’ll look at those with ggplots below.

Build the Admin vs. Market Price Table

Lastly, we’ll build a table of all the items and whether they’re classified as market or admin prices according to the different methods developed above. These are (for market prices)…

In the table below for columns 2 and 3, 1 indicates market item, 0 indicates admin.

Following that in the table are columns for the highest contemporaneous price-quantity correlation and the scale of that correlation, the same for lagged correlations, then the percentage price change between Jan. 2020 and the most recent available month (Nov. 2021), and lastly the price-quantity beta coefficient and p-value for the simple regression. The table is ordered according to the price change.

Categorizing Administered vs. Market Prices

First, I’ll check to see how the price-quantity correlation compares to the simple regression P-value

Interestingly, the two don’t match up as much as I would have expected. In fact, while 150 of the 192 items have p-values of less than 0.01 associated with the beta coefficient of the regression, only 45 have correlations greater than 0.8 in the wavelet scale with the highest correlation. Of course, the regression results are rudimentary, but this calls for further methodological study in any case.

The following compares the energy of the first two scales and the 1st to 5th scale ratio to the highest correlation.

Well, that’s not the best way to plot that, but for now it’s good enough to make me think the energy approach isn’t very useful.

All in all, I think the price-quantity approach is best suited for distinguishing market from administered prices, though energy could useful in related analyses. It’s worth noting, though, that to the extent that the low-scale (i.e. high frequency energies) don’t tell us much, this would suggest that the traditional approach of counting months with zero price change is probably also not a very sound method.

In fact, the wavelet method in general suggests a more suitable approach, as it allows for recognition of price changes at different frequencies, which itself is similar to the traditional approach, but with more detailed information about those frequencies. That is, whereas the traditional approach can only tell us how frequently there is no price change, the wavelet approach can tell us, e.g., how much or how little frequent price changes (say in the 2-4 month range) makes up the total of price volatility. It seems to me that this is at least as sound an approach as the traditional approach, plus it lends additional information. For instance, prices may change at a lower frequency reflecting, planning processes of the price administrators (though it must also be noted that price changes at higher frequency may also reflect those planning processes–e.g. temporary sales).

Administered vs. Market Prices and the Pandemic Inflation

I want to do a quick check of the pandemic price change against price-quantity correlation. Here’s a plot, along with some zooms of the same:

[Note: the analysis below refers to the original correlation numbers, which were run on the longer dataset. These have changed now that we’re only selecting from 1983 on.]

The horizontal axis alone is kind of interesting: items with higher correlations tend to have those correlations at coarser scales (lower frequency). An argument could be made that the only proper market prices are those with high correlations at finer scales (i.e. the darker colored dots that are also toward the left or right extremes on the above plots), although as noted earlier, this could also reflect sales among administered price items.

More generally, nothing in the plots above jumps out at me as showing that the inflation is either the result of market prices alone, or administered prices generally. Rather, it looks like certain items have seen particularly high inflation (which is consistent with what many have been reporting), including especially used car dealerships, meat processors, and household appliance producers. This perspective does beg additional questions, though. For instance, the price rise for processed and fresh fruits has been about the same, despite the latter clearly be more market-priced and the former more administered; so what’s going on with meat?

And, of course, what’s going on with used cars. Plot 7. above shows most of them. Note that new domestic autos have some p-q correlation (foreign autos do not), and the used auto margin falls in between (although net transactions in used autos and used light trucks have greater correlations); yet the inflation for new autos has been on the high end of the cluster of most items, but nowhere near where the used autos are. That demands an explanation, and I have trouble saying it’s all chip shortages (but…maybe?).

Lastly, of course, this is just one way to look at the pandemic inflation. We’ll develop others (including the difference-in-difference approach), and I think compare those results simply to the correlations values we’re looking at above.

Here’s the plot using the other metric for the pandemic inflation (% change in average monthly price changes, pandemic vs. decade prior):

(Note: wine is way off the scale so it’s been removed from view)

And one more pass to look at the average monthly price change pre- versus during pandemic. Average monthly change in price in the decade prior to the pandemic is on the x axis, and during the pandemic is on the y axis. The color reflects the correlation between price and quantity at the finest (2-4 month) scale.

Here’s the same thing but with color reflecting highest correlation value.

Here’s the same thing but with color reflecting highest correlation value, only showing items with |highest correlation| > 0.5.

And the same again, but with correlations <= 0.5